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INFORMATION PROCESSING APPARATUS 
AND METHOD OF CONTROLLING MEMORY THEREOF 

BACKGROUND OF THE INVENTION 
5 1. Field of the Invention: 

The present invention relates to an information 
processing apparatus, and more particularly to a vector 
information processing apparatus for parallel processing 
of information on a hardware basis. 
10 2. Description of the Related Art: 

Information processing apparatus in recent years 
suffer greater signal delays caused by transmission 
lines as the operating frequency thereof goes higher. 
In such information processing apparatus, it is very 
15 difficult to operate a plurality of semiconductor 

integrated circuits (CPUs, LSI circuits, etc.) with 
clock signals that are kept in phase with each other. 

One solution to the above problem is proposed as a 
method of synchronizing, on a software basis, the 
20 processes carried out by a plurality of CPUs that 

operate with asynchronous clock signals. For example, 
there is known a method of dispatching a plurality of 
designated processes to CPUs that operate under 
different OSs (Operating Systems) using a hardware 
25 function referred to as a barrier 

synchronization/communication register. Since this 
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method is based on the premise that the plural processes 
operate at entirely different timings, operation 
failures on account of the hardware function do not 
occur even if the clock signals of the respective CPUs 
5 are out of synchronism with each other. The method has 
been implemented in products called a scalar parallel 
computer, for example • 

The above method, which synchronizes a plurality of 
processes on a software basis, has an increased apparent 

10 performance vs. cost ratio because it can be achieved 
much more inexpensively than attempts to improve 
hardware performance such as CPU operating speeds and 
data transfer speeds between CPUs and memories. 

However, the above method is problematic in that it 

15 is highly difficult to parallelize programs. The 

difficulty arises from the fact that instructions used 
in programs have a wide variety of different limitations 
on parallelization. Even if programs can be 
parallelized, a process of debugging them is much more 

20 difficult to perform than programs that are not 

parallelized. The debugging process is generally 
carried out when performance tuning is effected on the 
information processing apparatus, and requires a high 
level of skill about the parallel processing technology. 

25 Inasmuch as the difficult debugging process needs to be 
carried out each time hardware improvements are 



2 



s 



introduced, vast program resources are made useless. 
Even if technical goals for parallelizing programs are 
accomplished, another problem is encountered in that 
sufficient human resources are not available for 
5 operating the programs at site. 

The above problems may be solved by the parallel 
processing of information on a hardware basis. One 
specific example of such a solution is known as a vector 
information processing apparatus. 

10 A vector process is a process (Single Instruction 

Multiple Data stream: SIMD) for simultaneously 
processing a plurality of regularly arranged data. A 
register which stores such a plurality of regularly 
arranged data is referred to as a vector resistor, and 

15 instructions for performing the same operation on, 

effecting memory access to, and transferring, all the 
elements stored in the vector register are referred to 
as vector instructions. 

A vector instruction is described, for example, as: 

20 LVL VL < - 128 

VADD V7 < - V5 + V4 

In this example, elements (128 elements) to be 
processed are stored in a VL (vector length register) 
using an LVL (Load VL) instruction, after which elements 
2 5 (12 8 elements) in vector registers V5, V4 are added 
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using a VADD (vector addition) instruction, and the 
resultant sum is stored in a vector register V7 . 

According to the vector process, since software- 
based synchronization between processes is not required, 
5 software can be generated on the same idea as with a 
single CPU. The vector process has actually been used 
effectively as a parallelizing process, and a compiler 
for parallelization already exists. 

For improving performance with the vector process, 

10 however, a bandwidth (data transfer speed) commensurate 
with the performance to be improved needs to be kept 
between a CPU and a memory. If the CPU comprises a 
plurality of vector units for executing vector 
instructions and the vector units are operated parallel 

15 to each other, then processing operations can be 
performed at a higher speed. 

The vector information processing apparatus which 
has a CPU comprising a plurality of vector units suffers 
problems to be described below when a VSC (vector 

20 scatter) instruction is executed. 

The VSC instruction is a very important instruction 
in the vector information processing apparatus. 
Specifications of the VSC instruction will be described 
below with reference to FIG. 1 of the accompanying 

2 5 drawings. 
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As shown in FIG . 1, the VSC instruction is an 
instruction which uses elements in a vector register Vy 
designated by a Y field and stores elements in a vector 
register Vz designated by a corresponding Z field in a 
5 memory. In FIG. 1, an OPC field is an operation code 
indicative of a VSC code, and an X field is an invalid 
area which is not used. 

In a process according to a VSC instruction, 
elements are successively written into a memory in the 

10 sequence of element numbers. For storing a plurality of 
elements at the same address, in particular, priority 
has to be given to the writing of an element having a 
larger element number. For example, when an element n 
and an element n+1 are to be stored at the same memory 

15 address, it is necessary to give priority to the writing 
of the element n+1 and invalidate the element n. If the 
process is carried out by a single unit or a plurality 
of units that operate in synchronism with each other as 
is conventional, the above limitation is not required to 

20 be taken into account since writing requests are issued 
in the sequence of element numbers from one port. 

In the vector information processing apparatus 
where the CPU comprises a plurality of asynchronously 
operating units, since the sequence of processing based 

2 5 on element writing requests (hereinafter referred to as 
element requests) issued from the units is not 
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guaranteed, the sequence of writing requests in the 
memory may be reversed. 

For example, as shown in FIG. 2 of the accompanying 
drawings, a CPU comprises a master unit and a slave unit 
which are asynchronously operating units and element 
requests of adjacent element numbers (an element n and 
an element n+1) are distributed to and issued from the 
master and slave units. If the element requests are 
requests for storing the element n and the element n+1 
at the same memory address, then a memory controller for 
controlling the writing of elements in the memory may 
possibly process the element n+1 prior to the element n. 
If the element n+1 is written prior to the element n, 
then the element n is written to overwrite the element 
n+1. 

The above problem may be solved by synchronizing 
element requests issued from a plurality of 
asynchronously operating units. This solution, however, 
requires increased overhead for synchronizing element 
re q Ues ts, and results in increased intervals at which 
the element requests are issued. These drawbacks cancel 
out the advantages provided by a high-speed processing 
apparatus based on parallel operation of the master and 
slave units. 

SUMMARY OF THE INVENTION 
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It is an object of the present invention to provide 
an information processing apparatus which has a CPU 
comprising a plurality of asynchronously operating units 
and is capable of eliminating adverse effects caused 
5 when a processing sequence of element requests from the 
units is reversed, and a method of controlling a memory 
of such an information processing apparatus. 

To achieve the above object, a vector information 
processing apparatus according to the present invention 

10 has a CPU comprising a plurality of asynchronously 

operating units, a main memory for storing data, and a 
main memory controller for controlling the writing of 
data in the main memory, the main memory controller 
having a VSC address buffer for holding a storage 

15 address in the main memory for each element designated 
by a vector scatter instruction, for inhibiting the 
outputting of a writing permission signal for the main 
memory which is generated according to a writing request 
for writing an element having a smaller element number, 

20 which has the same storage address and which has not 
been processed in a sequence of element numbers, of 
writing requests for writing elements in the main memory 
which are issued respectively from the asynchronously 
operating units according to a vector scatter 

25 instruction. 
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With the above arrangement, the CPU may comprise a 
plurality of asynchronously operating units because 
adverse effects caused when a processing sequence among 
a plurality of asynchronously operating units is 
reversed are eliminated. It is thus possible to reduce 
the circuit scale of each unit, and the yield of the 
units (LSI circuits) is increased and the number of 
external terminals thereof is reduced, resulting in a 
reduction in the cost of the information processing 
apparatus . 

In the information processing apparatus, the main 
memory controller comprises a VSC address buffer 
controller for controlling the VSC address buffer to 
hold the storage address sent from the asynchronously 
operating units and, if the VSC address buffer suffers 
an overflow, requests the asynchronously operating unit 
which has issued a vector scatter instruction that has 
caused the overflow to resend the element, and the 
asynchronously operating unit has a retry buffer for 
holding each element designated by the vector scatter 
instruction issued thereby, and resends an element held 
by the retry buffer to the main memory controller if 
requested by the main memory controller to resend the 
element . 

With the above arrangement, since the VSC address 
buffer can be used efficiently, the scale of the VSC 
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address buffer can be reduced, and the cost of the main 
memory can be lowered. 

The above and other objects, features, and 
advantages of the present invention will become apparent 
5 from the following description with reference to the 
accompanying drawings which illustrate examples of the 
present invention . 

BRIEF DESCRIPTION OF THE DRAWINGS 
10 FIG. 1 is a diagram showing the specifications of a 

VSC instruction which is used by a vector computer; 

FIG. 2 is a diagram showing the manner in which a 
CPU comprises a master unit and a slave unit and 
elements are stored in a main memory unit by a VSC 
15 instruction; 

FIG. 3 is a block diagram of an information 
processing apparatus according to a first embodiment of 
the present invention; 

FIG. 4 is a block diagram of a main memory 
20 controller in the information processing apparatus shown 
in FIG. 3; 

FIG. 5 is a diagram showing the manner in which the 
information processing apparatus shown in FIG. 3 
operates under a VSC instruction; 
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FIG. 6 is a block diagram of a CPU in an 
information processing apparatus according to a second 
embodiment of the present invention; 

FIG. 7 is a block diagram of a main memory 
controller in the information processing apparatus 
according to the second embodiment of the present 
invention; 

FIG. 8 is a block diagram of a deadlock detection 
controller in the main memory controller shown in FIG. 
7 ; and 

FIG. 9 is a diagram showing the manner in which the 
deadlock detection controller shown in FIG. 8 operates. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
15 1st Embodiment: 

FIG. 3 shows in block diagram an information 
processing apparatus according to a first embodiment of 
the present invention, and FIG. 4 shows in block diagram 
a main memory controller in the information processing 
2 0 apparatus shown in FIG. 3. 

As shown in FIG. 3, the information processing 
apparatus according to the first embodiment generally 
comprises CPU (Central Processing Unit) 100 and MMU 
(Main Memory Unit) 2 00. 
25 CPU 100 has two asynchronously operating units, 

i.e., master unit 1 and slave unit 2. Basically, master 
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unit 1 and slave unit 2 carry out the same processing 
according to vector instructions except that master unit 
1 manages issuance/ending of vector instructions. These 
vector units are usually in the form of two different 
LSI circuits , but may be incorporated in a single LSI 
circuit insofar as they operate asynchronously. 

In FIG. 3, CPU 100 is shown as comprising single 
master unit 1 and single slave unit 2. However, CPU 100 
may comprise single master unit 1 and a plurality of 
slave units 2, or may comprise a plurality of sets each 
having single master unit 1 and at least single slave 
unit 2. 

Master unit 1 and slave unit 2 shown in FIG. 3 are 
arranged to process 25 6 elements simultaneously 
according to VSC instructions. It is assumed that 
master unit 1 processes even-numbered elements and slave 
unit 2 processes odd-numbered elements. 

Master unit 1 has instruction controller 11 , 
master-side RQ control circuit 12, vector register 13 , 
and PNU 14. Slave unit 2 has slave-side RQ control 
circuit 22 , vector register 23, and PNU 24. 

Instruction controller 11 supplies master-side RQ 
control circuit 12 and slave-side RQ control circuit 22 
with corresponding operation codes when a vector 
instruction is issued. 



In response to the operation code of the VSC 
instruction , master-side RQ control circuit 12 checks 
all conditions for executing the VSC instruction, and 
sends a VSCstart signal for starting a processing at a 
5 predetermined timing to slave-side RQ control circuit 22. 
Master-side RQ control circuit 12 issues element 
requests of even-numbered requests, and sends request 
information including an element number, unit number 
CPU#, identifier VSCid, etc., together with a storage 

10 address and data to be stored in a memory which are 

designated by the VSC instruction and read from vector 
register 13, to PNU 14. Unit number CPU# serves to 
distinguish units which have issued element requests. 
For example, unit number CPU# of request information 

15 sent from master unit 1 is assigned an even number, and 
unit number CPU# of request information sent from slave 
unit 2 is assigned an odd number. However, if the CPU 
comprises a plurality of sets of master units 1 and 
slave units 2, then unit numbers CPU# assigned to one 

20 set of master unit 1 and slave unit 2 are set to 
successive values. Identifier VSCid serves to 
distinguish VSC instructions, and one identifier is 
assigned to all element requests that are issued by a 
single VSC instruction. 

25 PNU 14 holds element requests, request information, 

storage address and data received from master-side RQ 

12 



' 1 

control circuit 12, temporarily in CPU output RQ 
register 15, and then sends them to MMU 200, 

Slave-side RQ control circuit 22 processes odd- 
numbered elements in the same manner as master-side RQ 
5 control circuit 12, and sends a VSCend signal, which 
indicates a processing end at the issuance timing of a 
final one of a plurality of element requests that are 
issued by one VSC instruction, to master-side RQ control 
circuit 12. 

10 In response to the VSCend signal, master-side RQ 

control circuit 12 starts processing a next VSC 
instruction that is issued from instruction controller 
11 after having waited for a final element request 
issued by master-side RQ control circuit 12. 

15 Vector register 23 of slave unit 2 operates in the 

same manner as vector register 13 of master unit 1, and 
PNU 2 4 of slave unit 2 operates in the same manner as 
PNU 14 of master unit 1. 

MMU 200 has MM (Main Memory) 4 for storing data and 

2 0 MMC (Main Memory Controller) 3 for controlling the 

writing of data into MM 4 . In FIG. 3, the information 
processing apparatus is shown as having single MMU 200. 
However, the information processing apparatus is not 
limited to single MMU 200, but may have a plurality of 

25 MMUs 200. 
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MM 4 comprises a plurality of (eight in FIG. 3) 
memory units which are called banks that can be accessed 
parallel to each other. Each of the banks stores data 
in an interleaved pattern at designated addresses. 
5 MMC 3 comprises MMU input RQ registers 31 1# 31 2 , VSC 

address buffer 32 , request distributing circuit 33 , 
sequence monitoring circuit 34 , WE register 35, and 
adr/data register 36. 

MMU input RQ registers 31 lf 32 2 are registers for 
10 receiving request information , storage addresses and 
data sent from master unit 1 and slave unit 2, and are 
associated with master unit 1 and slave unit 2, 
respectively. 

VSC address buffer 32 is a register for temporarily 
15 storing storage addresses for storing elements in MM 4 . 
Each of master unit 1 and slave unit 2 shown in FIG. 3 
has two registers for holding addresses of 256 W (words) 
in order to simultaneously process 256 elements. In the 
present embodiment, VSC address buffer 32 holds storage 
20 only addresses corresponding to one VSC instruction, and 
does not store overlapping addresses which correspond to 
a plurality of VSC instructions. 

Request distributing circuit 33 distributes a Mem- 
Write signal for permitting the writing of an element, 
2 5 and a designated storage address and data to the banks 
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of the MM 4 which correspond to the storage addresses for 
elements . 

If element requests issued by one VSC instruction 
include element requests which have the same storage 
5 address and which are not processed in the sequence of 
element numbers , then sequence monitoring circuit 34 
inhibits the outputting of a Mem-Write signal generated 
by an element request of a smaller request number, among 
those element requests. 
10 WE register 35 holds Mem-Write signals sent from 

request distributing circuit 33, and sends the Mem-Write 
signals to the banks of MM 4 which correspond to storage 
addresses . 

adr/data register 36 holds storage addresses and 
15 data sent from request distributing circuit 33, and 

sends them to the banks of MM 4 which are designated by 
storage addresses. 

As shown in FIG. 4, the sequence monitoring circuit 
34 comprises comparing circuit 3 7 for comparing element 
2 0 numbers and storage addresses of a preceding element 

request and a following element request with each other, 
WE inhibiting circuit 38 for inhibiting the outputting 
of a Mem-Write signal generated by an element request of 
a smaller request number if the sequence of processing 
25 is reversed and the storage addresses agree with each 

other as a result of the comparison by comparing circuit 
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37 , and buffer control circuit 39 for controlling the 
writing of storage addresses into VSC address buffer 32. 

Comparing circuit 3 7 comprises element number 
comparator 371 for comparing the element numbers of a 
5 preceding element request and a following element 

request which are stored in VSC address buffer 32 with 
each other thereby to detect a reversal of the sequence 
of processing, address comparator 372 for comparing the 
storage addresses of a preceding element request and a 

10 following element request which are stored in VSC 

address buffer 32 with each other thereby to detect an 
access to the same address, and AND gate 373 for 
outputting the result of ANDing of the compared result 
from element number comparator 371 and the compared 

15 result from address comparator 372. Each of element 

number comparator 371, address comparator 372, and AND 
gate 373 is provided as 256 x 2 = 512 units 
corresponding to the number of words (the number of 
entries) that can be stored in VSC address buffer 32. 

20 If a following element request is issued from 

master unit 1, then address comparator 371 compares the 
address of the following element request and the address 
of a preceding element request issued from slave unit 2. 
If a following element request is issued from slave unit 

25 2, then address comparator 371 compares the address of 
the following element request and the address of a 
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preceding element request issued from master unit 1. 
This is because element requests successively issued 
from master unit 1 or slave unit 2 do not reverse the 
sequence of processing. 
5 When the output of AND gate 37 3 goes active, it 

indicates that the sequence of processing element 
requests issued from master unit 1 and slave unit 2 is 
reversed, and a wrong writing event occurs in MM 4 . 
Buffer control circuit 39 comprises writing 

10 controller 391, decoding circuit 392, VSCid register 393, 
VSCid comparator 394, and inverter 3 95. 

Decoding circuit 392 decodes operation codes sent 
from master unit 1 and slave unit 2, and generates a VSC 
signal indicative of a VSC instruction and a Mem-Write 

15 signal which is a signal for permitting the writing of 
data into MM 4 . 

Writing controller (Write con) 391 is a logic 
circuit which is supplied with a valid (V) signal 
indicating that an element is valid, unit number CPU#, 

20 and a VSC signal, and when an element is valid, searches 
VSC address buffer 32 which corresponds to an issuing 
source of element requests, and sends a writable (empty) 
address and a write enable WE signal to VSC address 
buffer 32. 

25 VSC address buffer 32 stores storage addresses 

(Address) transferred from MMU input RQ registers 31 x , 
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32 2 in association with element numbers (ELM) according 
to an address and a WE signal sent from writing 
controller 391. 

VSCid register 393 holds identifier VSCid allotted 
5 to a preceding element request. VSCid comparator 394 

compares identifier VSCid held by VSCid register 393 and 
identifier VSCid allotted to a following element request 
with each other. When identifier VSCid allotted to the 
following element request changes, i.e., when control 

10 goes to the processing of a next VSC instruction, VSCid 
comparator 394 sends a timing signal (Clear) in order to 
clear the content stored in VSC address buffer 32. The 
timing signal sent from VSCd comparator 394 is supplied 
via inverter 395 to VSC address buffer 32. 

15 WE inhibiting circuit 38 has two OR gates 381, two 

first AND gates 382, and two second AND gates 383, which 
are as many as the number of VSC address buffers 32. 

Each OR gate 381 inhibits the outputting of a Mem- 
Write signal for permitting the writing of an element if 

20 the output of any of the corresponding 256 AND gates 3 73 
becomes active under the conditions that an element is 
valid, the VSC instruction is the same, and identifier 
VSCid remains unchanged based of the compared result 
from each AND gate 373. 

25 Request distributing circuit 33 comprises WE 2x8 

switch 331 for sending two Mem-Write signals 
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corresponding respectively to master unit 1 and slave 
unit 2 to the corresponding banks of MM 4, and adr/data 
2x8 switch 332 for sending storage addresses and data 
sent from master unit 1 and slave unit 2 to the 
5 corresponding banks of MM 4 . 

Operation of the information processing apparatus 
according to the present embodiment at the time a VSC 
instruction is issued will be described below with 
reference to FIG, 5. 

10 FIG, 5 shows the manner in which the information 

processing apparatus shown in FIG, 3 operates under a 
VSC instruction. In FIG. 5, it is assumed that CPU 100 
comprises single master unit 1 and single slave unit 2. 
As shown in FIG. 5, when instruction controller 11 

15 of master unit 1 issues a VSC instruction, master-side 
RQ control circuit 12 generates a VSCstart signal and 
sends the VSCstart signal to slave-side RQ control 
circuit 22. 

Then, master-side RQ control circuit 12 and slave- 
20 side RQ control circuit 22 issue respective element 

requests of element numbers handled by their units, read 
(REG read) storage address and data designated by the 
VSC instruction from vector registers 13, 23, and sends 
the read storage address and data, and generated request 
25 information via PNUs 14, 24 to MMU 200. 
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It is assumed that master unit 1 issues even- 
numbered element requests successively from element 0 to 
element n f and slave unit 2 issues odd-numbered element 
requests successively from element 1 to element n-1. 
5 For example, if element requests for element n-1 

and element n have an instruction to store data at the 
same address, it is possible for MMU 200 to receive the 
element request for element n prior to the element 
request for element n-1. 

10 In the present embodiment, the element number and 

storage address corresponding to the following element 
request for element n-1 and the element number and 
storage address corresponding to the preceding element 
request for element n are compared with each other by 

15 comparing circuit 37 of MMC 3- If it is detected that 
the storage address for element n is stored in VSC 
address 32 before the storage address for element n-1, 
then a Mem-Write signal generated by the element request 
for element n-1 is inhibited from being sent out by WE 

20 inhibiting circuit 38. 

Therefore, the data of element n-1 is not written 
in MM 4, and hence is prevented from overwriting the 
data of element n. 

When slave unit 2 issues a final element request, 

25 then slave-side RQ control circuit 22 sends a VSCend 

signal indicative of a processing end to master-side RQ 
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control circuit 12 at the issuance timing of the final 
element request. In response to the VSCend signal, 
master-side RQ control circuit 12 starts processing a 
next VSC instruction that is issued from instruction 
5 controller 11 after having waited for an issuance end of 
a final element request of master-side RQ control 
circuit 12. 

When MMC 200 receives an element request under a 
next VSC instruction, MMC 2 00 clears all the contents of 

10 VSC address buffer 32 which have been stored by the 

preceding VSC instruction. At this time, the preceding 
VSC instruction and the following VSC instruction are 
distinguished from each other using identifier VSCid. 

In the present embodiment, the CPU is illustrated 

15 as comprising one set of master unit 1 and slave unit 2 
for an easier understanding of the present invention. 
However, the CPU often comprises a plurality of sets of 
master units 1 and slave units 2 in actual information 
processing apparatus. 

2 0 For example, if the CPU comprises a plurality of 

sets of master units 1 and slave units 2 and VSC address 
buffer 32 is arranged to hold storage addresses 
corresponding to a plurality of VSC instructions, then 
the processing speed of the information processing 

25 apparatus can further be increased between the intervals 
at which VSC instructions are issued can be shortened. 
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However, such an arrangement requires as many VSC 
address buffers 32 as (the number of sets of master 
units 1 and slave units 2) x 2. If it is assumed that 
each of master units 1 and slave units 2 simultaneously 
5 processes 2 56 elements, then the number of entries of 
VSC address buffers 32, which is represented by the 
following equation, is required: 

The number of entries = (the number of sets of 
master units 1 and slave units 2) x 512 x (the number of 

10 overlaps of VSC instructions). 

If the number of sets of master units 1 and slave 
units 2 is increased, then the scale of the circuit 
required is greatly increased because components of 
comparing circuit 37 have to be added accordingly. 

15 Particularly, if the VSC address buffer is arranged to 
hold storage addresses corresponding to a plurality of 
VSC instructions, then a further increase in the scale 
of the circuit is caused due to an additional circuit 
increase required by a complex processing. 

20 For the above reason, VSC address buffer 32 is 

preferably arranged to hold storage addresses 
corresponding to a single VSC instruction. The number 
of sets of master units 1 and slave units 2 should 
preferably be set to an optimum number for fulfilling 

25 performance requirements of the information processing 
apparatus by trading off a reduction in the performance 
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due to increased intervals at which VSC instructions are 
issued and an increase in the scale of the circuit. 

Similarly, if the CPU comprises single master unit 
1 and a plurality of slave units 2, then VSC address 
5 buffer 32 is preferably arranged to hold storage 

addresses corresponding to a single VSC instruction, and 
the number of slave units 2 should preferably be set to 
an optimum number for fulfilling performance 
requirements of the information processing apparatus. 

10 With the information processing apparatus according 

to the present embodiment, since adverse effects caused 
when a processing sequence of element requests issued 
from a plurality of units is reversed are eliminated, a 
wrong writing event does not occur even if the CPU 

15 comprises a plurality of asynchronously operating units. 
With the CPU comprising a plurality of units, it is 
possible to reduce the circuit scale of each unit, and 
the yield of the units (LSI circuits) is increased and 
the number of external terminals thereof is reduced, 

20 resulting in a reduction in the cost of the information 
processing apparatus. 
2nd Embodiment: 

An information processing apparatus according to a 
second embodiment of the present invention will be 

25 described below with reference to FIGS. 6 through 9. 
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FIG. 6 shows in block form a CPU in an information 
processing apparatus according to a second embodiment of 
the present invention. FIG. 7 shows in block form a 
main memory controller in the information processing 
5 apparatus according to the second embodiment of the 
present invention. FIG. 8 shows in block form a 
deadlock detection controller in the main memory 
controller shown in FIG. 7. FIG. 9 shows the manner in 
which the deadlock detection controller shown in FIG. 8 

10 operates. 

In the first embodiment, a single VSC address 
buffer is associated with each of the master unit and 
the slave unit, and hence the MMC is required to have as 
many VSC address buffers as the number of master units 

15 and slave units. 

According to the second embodiment, the MMC is 
arranged to hold storage addresses designated by element 
requests of the master unit and the slave unit in a 
single VSC address buffer. Accordingly, request 

2 0 information, storage addresses, and data sent from the 
master unit and the slave unit are commonly received by 
a single MMU input RQ register. The VSC address buffer 
according to the second embodiment is arranged to hold 
storage addresses corresponding to a plurality of VSC 

25 instructions. 
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With such an arrangement, since the VSC address 
buffer may possibly suffer a shortage of empty entries 
(overflow) that are required to hold storage addresses , 
an RQ resend request signal for resending elements 
5 corresponding to overflowed VSC instructions is sent 
from the MMC to the master unit and the slave unit. 

The master unit and the slave unit according to the 
second embodiment have a retry buffer for holding 
request information, storage addresses and data 

10 corresponding to element requests which have already 
been issued. When the master unit and the slave unit 
receive an RQ resend request signal from the MMC, the 
master unit and the slave unit resend request 
information, storage addresses and data held by the 

15 retry buffer to the MMC . At this time, because element 
numbers that start to be resent may possibly differ 
between the master unit and the slave unit, the master 
unit and the slave unit start resending an element with 
a smaller element number, of elements to be resent. For 

20 example, if the master unit resends an element with a 
smaller element number, then the slave unit resends an 
element with an element number which is the element 
number + 1 of the element resent by the master unit, and 
if the slave unit resends an element with a smaller 

25 element number, then the master unit resends an element 
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with an element number which is the element number - 1 
of the element resent by the slave unit. 

When the master unit and the slave unit according 
to the second embodiment receive an RQ resend request 
5 signal from the MMC, the master unit and the slave unit 
send a buffer clear signal to the MMC for clearing the 
content of the VSC address buffer which corresponds to 
an element request of a VSC instruction which has 
overflowed the VSC address buffer. 

10 As shown in FIG. 6, master unit 5 according to the 

second embodiment comprises control retry buffer 5 6 for 
holding request information corresponding to an element 
request that has been issued, address/data retry buffer 
57 for holding a storage address and data corresponding 

15 to the element request, and first selector 58 and second 
selector 59 for resending request information, a storage 
address and data to the MMC when an RQ resend request 
signal is received from the MMC ♦ 

Master-side RQ control circuit 52 sends request 

20 information corresponding to an element request that has 
been issued to control retry buffer 56 and first 
selector 58. When an RQ resend request signal is not 
received, first selector 58 sends request information 
received from master-side RQ control circuit 52 to PNU 

25 54. When an RQ resend request signal is received, first 
selector 58 sends request information held by control 
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retry buffer 56 to PNU 54. Similarly, vector register 
53 sends a storage address and data designated by an 
element request to address/data retry buffer 57 and 
second selector 59. When an RQ resend request signal is 
5 not received, second selector 59 sends a storage address 
and data received from vector register 53 to PNU 54. 
When an RQ resend request signal is received, second 
selector 59 sends a storage address and data held by 
address/data retry buffer 57 to PNU 54. 

10 When an RQ resend request signal is received from 

the MMC, master-side RQ control circuit 52 sends a start 
element number instruction for acquiring a start element 
number which represents the element number of an element 
that starts to be resent, to slave-side RQ control 

15 circuit 62 of slave unit 6. When a start element number 
instruction is received from master-side RQ control 
circuit 52, slave-side RQ control circuit 62 returns the 
start element number of an element that starts to be 
resent from slave unit 6 to master-side RQ control 

20 circuit 52 (start element number report). Master-side 
RQ control circuit 52 compares the start element number 
that starts to be resent from master unit 5 and the 
start element number received from slave-side RQ control 
circuit 62 with each other, and corrects the element 

2 5 number that starts to be resent based on a smaller one 
of the element numbers. If necessary, master-side RQ 
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control circuit 52 sends the corrected element number 
that starts to be resent to slave-side RQ control 
circuit 62. Then, master-side RQ control circuit 52 
sends a buffer clear signal via PNU 54 to the MMC for 
clearing the content of the VSC address buffer which 
corresponds to an element request of a VSC instruction 
issued thereby which has overflowed the VSC address 
buffer. PNU 54 sends a buffer clear signal to the MMC 
before resending of request information, storage 
addresses, and data to the MMC. 

Slave unit 6 is identical in arrangement to master 
unit 5 except for different operations of master-side RQ 
control circuit 52 and slave-side RQ control circuit 62. 

As shown in FIG. 7, MMU input RQ register 71 of MMC 
7 according to the second embodiment comprises the MMU 
input RQ register of the MMC according to the first 
embodiment shown in FIG. 4, with a V2 field added 
thereto for receiving a buffer clear signal sent from 
the CPU. 

VSC address buffer 72 according to the second 
embodiment is arranged to hold unit number CPU# and 
identifier VSCid for identifying a VSC instruction, in 
association with an element number (ELM) and a storage 
address (Address). In the present embodiment, in order 
to avoid a deadlock state to be described later on, the 
number of entries of VSC address buffer 72 is set to 
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(the number of words simultaneously processed by master 
unit 5) + 1 (257 words in the present embodiment). 

MMC according to the second embodiment comprises 
the MMC according to the first embodiment shown in FIG. 
4, with VSC address buffer controller 80, resend request 
register 81, and deadlock detection controller 82 added 
thereto. 

When VSC address buffer controller 80 detects an 
overflow of VSC address buffer 72, VSC address buffer 
controller 80 sends a resent request signal for causing 
master unit 5 and slave unit 6 to resend corresponding 
request information, a storage address and data, to 
resend request register 81 and deadlock detection 
controller 82. 

Deadlock detection controller 82 detects a deadlock 
state in which an overflow of VSC address buffer 72 and 
resending of elements from master unit 5 and slave unit 
6 are repeatedly carried out. When such a deadlock 
state occurs, deadlock detection controller 82 sends a 
delay value (BSYCNT) to shift the timing to resend 
elements to resend request register 81. 

When unit number CPU#, identifier VSCid, and 
element number ELM are transferred from MMU input RQ 
register 71 in synchronism with the issuance timing of 
an element request, and a resent request signal is 
received from VSC address buffer controller 80, resend 
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request register 81 sends an RQ resend request signal to 
master unit 5 and slave unit 6 an issuing source of an 
element request which has overflowed VSC address buffer 
72. At this time, resend request register 81 attaches a 
5 delay value BSYCNT received from deadlock detection 
controller 82 to the RQ resend request signal. 

Comparing circuit 77 according to the second 
embodiment comprises the element number comparator and 
the address comparator according to the first embodiment, 

10 and additionally includes unit number comparator 771 for 
comparing unit numbers CPU# corresponding to a preceding 
element request and a following element request stored 
in VSC address buffer 72 with each other, and identifier 
comparator 7 72 for comparing identifiers VSCid 

15 corresponding to a preceding element request and a 

following element request stored in VSC address buffer 
72 with each other. 

The compared results of unit number comparator 771 
and identifier comparator 7 72 are output via first AND 

20 gate 773 to second AND gate 791 of buffer controller 79. 
Second AND gate 791 outputs the result of logical AND of 
a buffer clear signal from the V2 field of MMU input RQ 
register 71 and an output signal from first AND gate 773. 
When buffer control circuit 79 receives a buffer 

25 clear signal from master unit 5 or slave unit 6, buffer 
control circuit 79 clears entries in VSC address buffer 
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72 which agree with unit numbers CPU# and identifiers 
VSCid corresponding to a preceding element request and a 
following element request. Other structural and 
operational details of the information processing 
5 apparatus according to the second embodiment are the 
same as those of the information processing apparatus 
according to the first embodiment, and will not be 
described in detail below. In MMC 7 shown in FIG. 7 , a 
request distributing circuit, a WE register, and an 

10 adr/data register are omitted from illustration. 

The deadlock state referred to above will be 
described below. 

If the CPU comprises single master unit 5 and 
single slave unit 6 and each of master unit 5 and slave 

15 unit 6 are arranged to simultaneously process 256 

elements, then it is possible for master unit 5 to issue 
256 element requests earlier than slave unit 6 due to a 
signal propagation delay and various issuance 
limitations on element requests. When all element 

20 requests are issued to the same MMC 7, if the number of 
entries of VSC address buffer 72 is 256, then all the 
entries of VSC address buffer 72 are used only to 
process element requests issued from master unit 5. 
Inasmuch as the processing of a first element request 

25 issued from slave unit 6 causes an overflow of VSC 

address buffer 72 in this state, VSC address buffer 72 
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sends an RQ resend request signal to master unit 5 and 
slave unit 6, which start resending elements from the 
first element. If master unit 5 issues 256 element 
requests again earlier than slave unit 6 at this time, 
5 an overflow of VSC address buffer 72 and the resending 
of elements from master unit 5 and slave unit 6 are 
repeatedly carried out r so that the VSC instructions 
will never be processed. Such a state is referred to as 
a deadlock. 

10 In order to avoid a deadlock state, the number of 

entries of VSC address buffer 72 may be set to (the 
number of elements simultaneously processed by master 
unit 5) + 1 (257 in the present embodiment). 

If the CPU comprises a plurality of sets of master 

15 units 5 and slave units 6 and each of the master units 5 
and slave units 6 simultaneously processes 256 elements, 
then master unit 5 in each set may possibly issue 256 
element requests earlier than slave unit 6. Therefore, 
if the number of entries of VSC address buffer 72 is 

20 smaller than (the number of elements simultaneously 
processed by master unit 5) x (the number of sets of 
master units 1 and slave units 2), then there is a 
possibility of a deadlock state in which an overflow of 
VSC address buffer 72 and resending of elements from 

2 5 master unit 5 and slave unit 6 are repeated (hereinafter 
referred to as resend cycle). 
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To avoid the deadlock state, the number of entries 
of VSC address buffer 72 may be set to (the number of 
elements simultaneously processed by master unit 5) x 
(the number of sets" of master units 5 and slave units 6) 
5 + 1. 

However, increasing the number of entries of VSC 
address buffer 72 just for the purpose of avoiding a 
deadlock state which hardly occurs actually invites an 
increase in hardware, and hence results in a new problem 
10 in that the cost of the MMU rises. 

According to the present embodiment, MMC 7 has 
deadlock detection controller 82 shown in FIG. 7 for 
avoiding a deadlock state while suppressing an increase 
in hardware. 

15 When the same element number corresponding to the 

same VCS instruction is resent from same master unit 5 
and slave unit 6 within a predetermined period, deadlock 
detection controller 82 delays the issuance timing of an 
element request from master unit 5 and slave unit 6 at 

20 the time the element number is resent for the second 
time, thus disturbing the resend cycle of the element 
request thereby to get out of the deadlock. 

As shown in FIG. 8, deadlock detection controller 
82 comprises counter circuit 821 for measuring a 

25 deadlock detection period in which to detect whether a 
deadlock state has occurred or not, deadlock judging 
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circuit 822 for comparing unit number CPU#, identifier 
VSCid, and element number ELM corresponding to a storage 
address and data that have been resent with respective 
preceding values thereof, and judging that a deadlock 
5 state has occurred if all unit number CPU#, identifier 
VSCid, and element number ELM agree with the preceding 
values, random number generating circuit 82 3 for 
generating a delay value BSYCNT to shift the timing to 
issue an element request from master unit 5 and slave 

10 unit 6 when it is to be resent, and detection period 

control circuit 824 for controlling operation of counter 
circuit 821 and deadlock judging circuit 822. 

Counter circuit 821 has a selector, a register 
(detection period cnt), a subtractor (- 1), and a 

15 comparator (= 0). When counter circuit 821 starts 

counting, it reads the value (constant) of a deadlock 
detection period signal supplied from an external 
circuit in detection period cnt, starts counting 
(decrement) from the read value, and outputs its 

20 resultant count to detection period control circuit 824 
when a count value = 0 is detected. 

Deadlock judging circuit 822 has a resend id 
register (REG) and a comparator (=). Deadlock judging 
circuit 822 holds CPU#/VSCid/ELM transferred from MMU 

2 5 input RQ register 71 in the resend id REG when an 

element is resent, compares CPU#/VSCid/ELM transferred 



34 



when an element is resent next time with the values held 
by the resend idREG, and makes the deadlock detection 
signal active when all the CPU#/VSCid/ELM agree with the 
values held by the resend idREG. 

Detection period control circuit 824 has a latch 
circuit, an OR gate, and an AND gate. If deadlock 
judging circuit 822 detects that all the CPU#/VSCid/ELM 
agree with the values held by the resend idREG, 
detection period control circuit 824 causes counter 
circuit 821 to start counting. When the counting is 
finished, i.e., when the deadlock detection period is 
finished, or the deadlock detection signal is rendered 
active, detection period control circuit 824 resets the 
detection period cnt of counter circuit 821 or resets 
the resend idREG of deadlock judging circuit 822. 

With the above arrangement, as shown in FIG. 9, 
when an overflow of VSC address buffer 72 occurs due to 
an element request issued from any set of master unit 5 
and slave unit 6 and a resending of an element (RQ 
resending) occurs, detection period control circuit 824 
makes the deadlock detection signal active, and causes 
counter circuit 821 to start counting, thereby starting 
a process of detecting a deadlock. The process of 
detecting a deadlock is put to an end when the output 
value of counter 821 in detection period cnt becomes 0. 



The length of the deadlock detection period is set 
to the longest value (constant) of the period of a 
resend cycle generated by one set of master unit 5 and 
slave unit 6, and the set value is stored in a ROM (not 
5 shown) of the information processing apparatus. When 
the information processing apparatus is turned on, the 
stored value is supplied as a deadlock detection period 
signal from the ROM to counter circuit 821. 

Deadlock judging circuit 822 stores unit number 

10 CPU# , identifier VSCid, and element number ELM 

corresponding to an element request which has been 
resent when a deadlock detecting period is started, in 
the resend idREG. Deadlock judging circuit 822 compares 
unit number CPU#, identifier VSCid, and element number 

15 ELM at the time the element request is resent for the 

second time, with the values stored in the resend idREG, 
and judges that a deadlock state has occurred (detects a 
deadlock) when all unit number CPU#, identifier VSCid, 
and element number ELM agree with the values. 

20 At this time, a deadlock detection signal which is 

the output signal from deadlock judging circuit 822 
becomes active, and a delay value BSYCNT generated by 
random number generating circuit 823 is stored in resend 
request register 81. Resend request register 81 

25 attaches the delay value BSYCNT to the RQ resend request 
signal, and sends it to master unit 5 and slave unit 6 
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(RQ resend request). At the same time, detection period 
control circuit 824 resets the deadlock detection signal, 
ending the deadlock detecting process. 

Master-side RQ control circuit 52 of master unit 5 
and slave-side RQ control circuit 62 of slave unit 6, 
which have received the RQ resend request signal, shifts 
the issuance timing of an element request by the delay 
value BSYCNT, and resend a designated element. 

By carrying out the above process, the second 
resend cycle is made longer than the first resend cycle, 
disturbing the repetitive period of resend cycles to get 
out of the deadlock state. 

Since the above process of avoiding a deadlock is a 
process of disturbing the period of resend cycles 
generated between a plurality of sets of master units 5 
and slave units 6, the process fails to avoid a deadlock 
state that occurs if the CPU comprises single master 
unit 5 and single slave unit 6. Consequently, the 
number of entries of VSC address buffer 72 should 
preferably be set to at least (the number of elements 
simultaneously processed by master unit 5) +1. 

The information processing apparatus according to 
the present embodiment is capable of using VSC address 
buffer 72 efficiently. Therefore, the number of VSC 
address buffers 72 that are used can be reduced, thereby 
reducing the cost of MMU 7 . 



While preferred embodiments of the present 
invention have been described using specific terms, such 
description is for illustrative purposes only, and it is 
to be understood that changes and variations may be made 
without departing from the spirit or scope of the 
following claims* 
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